Overview

Dataset statistics

Number of variables18
Number of observations2093
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory294.5 KiB
Average record size in memory144.1 B

Variable types

NUM14
CAT3
BOOL1

Reproduction

Analysis started2020-12-07 12:58:41.011567
Analysis finished2020-12-07 12:59:32.136854
Duration51.13 seconds
Versionpandas-profiling v2.8.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml

Warnings

DisbursementDate is highly correlated with ApprovalFYHigh correlation
ApprovalFY is highly correlated with DisbursementDateHigh correlation
GrAppv is highly correlated with DisbursementGross and 1 other fieldsHigh correlation
DisbursementGross is highly correlated with GrAppv and 1 other fieldsHigh correlation
SBA_Appv is highly correlated with DisbursementGross and 1 other fieldsHigh correlation
CreateJob has 1226 (58.6%) zeros Zeros
RetainedJob has 515 (24.6%) zeros Zeros
FranchiseCode has 577 (27.6%) zeros Zeros
ChgOffPrinGr has 1398 (66.8%) zeros Zeros

Variables

Zip
Real number (ℝ≥0)

Distinct count810
Unique (%)38.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean92698.30673674152
Minimum65757
Maximum96161
Zeros0
Zeros (%)0.0%
Memory size16.5 KiB

Quantile statistics

Minimum65757
5-th percentile90041.6
Q191402
median92557
Q394124
95-th percentile95682
Maximum96161
Range30404
Interquartile range (IQR)2722

Descriptive statistics

Standard deviation1876.575796
Coefficient of variation (CV)0.02024390587
Kurtosis20.15777062
Mean92698.30674
Median Absolute Deviation (MAD)1246
Skewness-1.415556652
Sum194017556
Variance3521536.718
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
91910140.7%
 
92101140.7%
 
92618140.7%
 
90010130.6%
 
92562110.5%
 
92701110.5%
 
91730110.5%
 
93401100.5%
 
92109100.5%
 
91364100.5%
 
92108100.5%
 
90045100.5%
 
92069100.5%
 
9266090.4%
 
9135290.4%
 
9006690.4%
 
9194190.4%
 
9407090.4%
 
9212190.4%
 
9203790.4%
 
9280790.4%
 
9140580.4%
 
9513180.4%
 
9000480.4%
 
9284480.4%
 
Other values (785)184188.0%
 
ValueCountFrequency (%) 
657571< 0.1%
 
813011< 0.1%
 
820371< 0.1%
 
850081< 0.1%
 
900011< 0.1%
 
900031< 0.1%
 
9000480.4%
 
9000550.2%
 
9000670.3%
 
9000750.2%
 
ValueCountFrequency (%) 
9616170.3%
 
9614520.1%
 
961301< 0.1%
 
961201< 0.1%
 
9609330.1%
 
9608020.1%
 
960221< 0.1%
 
960131< 0.1%
 
9600220.1%
 
9600130.1%
 

ICS
Real number (ℝ≥0)

Distinct count24
Unique (%)1.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean531630.7582417582
Minimum531110
Maximum533110
Zeros0
Zeros (%)0.0%
Memory size16.5 KiB

Quantile statistics

Minimum531110
5-th percentile531190
Q1531210
median531312
Q3532230
95-th percentile532490
Maximum533110
Range2000
Interquartile range (IQR)1020

Descriptive statistics

Standard deviation522.086575
Coefficient of variation (CV)0.0009820473457
Kurtosis-1.182861806
Mean531630.7582
Median Absolute Deviation (MAD)102
Skewness0.6776118544
Sum1112703177
Variance272574.3918
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
53121079337.9%
 
53223023311.1%
 
5313901718.2%
 
5313111215.8%
 
5324901165.5%
 
531320864.1%
 
532111723.4%
 
532299713.4%
 
531120623.0%
 
531312482.3%
 
532412462.2%
 
532120432.1%
 
532420311.5%
 
532292311.5%
 
532310281.3%
 
531190251.2%
 
532210251.2%
 
531110231.1%
 
533110170.8%
 
532220140.7%
 
532291140.7%
 
532112110.5%
 
53241190.4%
 
53113030.1%
 
ValueCountFrequency (%) 
531110231.1%
 
531120623.0%
 
53113030.1%
 
531190251.2%
 
53121079337.9%
 
5313111215.8%
 
531312482.3%
 
531320864.1%
 
5313901718.2%
 
532111723.4%
 
ValueCountFrequency (%) 
533110170.8%
 
5324901165.5%
 
532420311.5%
 
532412462.2%
 
53241190.4%
 
532310281.3%
 
532299713.4%
 
532292311.5%
 
532291140.7%
 
53223023311.1%
 

ApprovalFY
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count23
Unique (%)1.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2004.032967032967
Minimum1989
Maximum2011
Zeros0
Zeros (%)0.0%
Memory size16.5 KiB

Quantile statistics

Minimum1989
5-th percentile1995
Q12003
median2005
Q32007
95-th percentile2008
Maximum2011
Range22
Interquartile range (IQR)4

Descriptive statistics

Standard deviation3.990591545
Coefficient of variation (CV)0.001991280388
Kurtosis2.948616214
Mean2004.032967
Median Absolute Deviation (MAD)2
Skewness-1.62594867
Sum4194441
Variance15.92482088
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
200740419.3%
 
200633816.1%
 
200524611.8%
 
200422710.8%
 
20031838.7%
 
20021336.4%
 
20081235.9%
 
2001984.7%
 
2000422.0%
 
1999412.0%
 
2010331.6%
 
2009281.3%
 
1991271.3%
 
1998261.2%
 
1995241.1%
 
1997231.1%
 
1996170.8%
 
1990150.7%
 
1994140.7%
 
1993140.7%
 
1989130.6%
 
1992120.6%
 
2011120.6%
 
ValueCountFrequency (%) 
1989130.6%
 
1990150.7%
 
1991271.3%
 
1992120.6%
 
1993140.7%
 
1994140.7%
 
1995241.1%
 
1996170.8%
 
1997231.1%
 
1998261.2%
 
ValueCountFrequency (%) 
2011120.6%
 
2010331.6%
 
2009281.3%
 
20081235.9%
 
200740419.3%
 
200633816.1%
 
200524611.8%
 
200422710.8%
 
20031838.7%
 
20021336.4%
 

Term
Real number (ℝ≥0)

Distinct count170
Unique (%)8.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean127.0831342570473
Minimum0
Maximum306
Zeros3
Zeros (%)0.1%
Memory size16.5 KiB

Quantile statistics

Minimum0
5-th percentile25
Q160
median84
Q3240
95-th percentile300
Maximum306
Range306
Interquartile range (IQR)180

Descriptive statistics

Standard deviation93.85850772
Coefficient of variation (CV)0.7385599062
Kurtosis-0.8737228175
Mean127.0831343
Median Absolute Deviation (MAD)36
Skewness0.824460891
Sum265985
Variance8809.419472
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
8447422.6%
 
24026512.7%
 
30025312.1%
 
1201567.5%
 
60783.7%
 
36422.0%
 
59241.1%
 
55211.0%
 
57190.9%
 
63180.9%
 
58170.8%
 
51170.8%
 
61160.8%
 
72160.8%
 
54160.8%
 
64150.7%
 
48140.7%
 
12140.7%
 
62130.6%
 
68130.6%
 
56130.6%
 
42130.6%
 
70130.6%
 
53120.6%
 
96120.6%
 
Other values (145)52925.3%
 
ValueCountFrequency (%) 
030.1%
 
160.3%
 
220.1%
 
320.1%
 
440.2%
 
530.1%
 
61< 0.1%
 
71< 0.1%
 
870.3%
 
950.2%
 
ValueCountFrequency (%) 
30630.1%
 
3051< 0.1%
 
3041< 0.1%
 
30320.1%
 
3011< 0.1%
 
30025312.1%
 
2971< 0.1%
 
2941< 0.1%
 
2911< 0.1%
 
2901< 0.1%
 

NoEmp
Real number (ℝ≥0)

Distinct count83
Unique (%)4.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean10.162446249402771
Minimum0
Maximum650
Zeros10
Zeros (%)0.5%
Memory size16.5 KiB

Quantile statistics

Minimum0
5-th percentile1
Q12
median3
Q38
95-th percentile32
Maximum650
Range650
Interquartile range (IQR)6

Descriptive statistics

Standard deviation34.47265084
Coefficient of variation (CV)3.392160706
Kurtosis187.0350313
Mean10.16244625
Median Absolute Deviation (MAD)2
Skewness12.29848122
Sum21270
Variance1188.363656
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
144821.4%
 
238118.2%
 
322110.6%
 
41597.6%
 
51426.8%
 
61065.1%
 
10733.5%
 
7663.2%
 
8643.1%
 
12482.3%
 
15401.9%
 
13321.5%
 
9311.5%
 
20261.2%
 
11241.1%
 
16201.0%
 
14170.8%
 
25120.6%
 
30110.5%
 
0100.5%
 
1890.4%
 
4090.4%
 
2180.4%
 
1780.4%
 
5070.3%
 
Other values (58)1215.8%
 
ValueCountFrequency (%) 
0100.5%
 
144821.4%
 
238118.2%
 
322110.6%
 
41597.6%
 
51426.8%
 
61065.1%
 
7663.2%
 
8643.1%
 
9311.5%
 
ValueCountFrequency (%) 
6501< 0.1%
 
60020.1%
 
5351< 0.1%
 
4501< 0.1%
 
3451< 0.1%
 
3271< 0.1%
 
2441< 0.1%
 
2371< 0.1%
 
2251< 0.1%
 
2201< 0.1%
 

NewExist
Categorical

Distinct count3
Unique (%)0.1%
Missing0
Missing (%)0.0%
Memory size16.5 KiB
1
1770
2
 
322
0
 
1
ValueCountFrequency (%) 
1177084.6%
 
232215.4%
 
01< 0.1%
 

Length

Max length3
Median length3
Mean length3
Min length3

Overview of Unicode Properties

Unique unicode characters4
Unique unicode categories (?)2
Unique unicode scripts (?)1
Unique unicode blocks (?)1
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
0209433.3%
 
.209333.3%
 
1177028.2%
 
23225.1%
 

Most occurring categories

ValueCountFrequency (%) 
Decimal Number418666.7%
 
Other Punctuation209333.3%
 

Most frequent Decimal Number characters

ValueCountFrequency (%) 
0209450.0%
 
1177042.3%
 
23227.7%
 

Most frequent Other Punctuation characters

ValueCountFrequency (%) 
.2093100.0%
 

Most occurring scripts

ValueCountFrequency (%) 
Common6279100.0%
 

Most frequent Common characters

ValueCountFrequency (%) 
0209433.3%
 
.209333.3%
 
1177028.2%
 
23225.1%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII6279100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
0209433.3%
 
.209333.3%
 
1177028.2%
 
23225.1%
 

CreateJob
Real number (ℝ≥0)

ZEROS

Distinct count43
Unique (%)2.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.5551839464882944
Minimum0
Maximum130
Zeros1226
Zeros (%)58.6%
Memory size16.5 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q32
95-th percentile10.4
Maximum130
Range130
Interquartile range (IQR)2

Descriptive statistics

Standard deviation8.026333221
Coefficient of variation (CV)3.141195855
Kurtosis81.58491365
Mean2.555183946
Median Absolute Deviation (MAD)0
Skewness7.799476199
Sum5348
Variance64.42202498
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0122658.6%
 
122610.8%
 
21989.5%
 
3884.2%
 
4783.7%
 
5683.2%
 
6311.5%
 
10301.4%
 
8201.0%
 
15150.7%
 
20140.7%
 
7120.6%
 
9110.5%
 
12100.5%
 
2580.4%
 
5070.3%
 
1150.2%
 
1840.2%
 
1340.2%
 
10030.1%
 
7530.1%
 
3530.1%
 
2120.1%
 
1420.1%
 
1920.1%
 
Other values (18)231.1%
 
ValueCountFrequency (%) 
0122658.6%
 
122610.8%
 
21989.5%
 
3884.2%
 
4783.7%
 
5683.2%
 
6311.5%
 
7120.6%
 
8201.0%
 
9110.5%
 
ValueCountFrequency (%) 
1301< 0.1%
 
10030.1%
 
7530.1%
 
691< 0.1%
 
651< 0.1%
 
631< 0.1%
 
601< 0.1%
 
5070.3%
 
451< 0.1%
 
4020.1%
 

RetainedJob
Real number (ℝ≥0)

ZEROS

Distinct count62
Unique (%)3.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5.814620162446249
Minimum0
Maximum535
Zeros515
Zeros (%)24.6%
Memory size16.5 KiB

Quantile statistics

Minimum0
5-th percentile0
Q11
median2
Q35
95-th percentile20
Maximum535
Range535
Interquartile range (IQR)4

Descriptive statistics

Standard deviation19.01559278
Coefficient of variation (CV)3.27030696
Kurtosis350.2461633
Mean5.814620162
Median Absolute Deviation (MAD)2
Skewness15.55617554
Sum12170
Variance361.5927689
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
051524.6%
 
136217.3%
 
231014.8%
 
31758.4%
 
41316.3%
 
51105.3%
 
6753.6%
 
8562.7%
 
10512.4%
 
7452.2%
 
12381.8%
 
15201.0%
 
20190.9%
 
9180.9%
 
11180.9%
 
13170.8%
 
16130.6%
 
14110.5%
 
3080.4%
 
1780.4%
 
2550.2%
 
2150.2%
 
1850.2%
 
1940.2%
 
2240.2%
 
Other values (37)703.3%
 
ValueCountFrequency (%) 
051524.6%
 
136217.3%
 
231014.8%
 
31758.4%
 
41316.3%
 
51105.3%
 
6753.6%
 
7452.2%
 
8562.7%
 
9180.9%
 
ValueCountFrequency (%) 
5351< 0.1%
 
3271< 0.1%
 
2441< 0.1%
 
2201< 0.1%
 
15020.1%
 
13020.1%
 
1021< 0.1%
 
1001< 0.1%
 
881< 0.1%
 
8520.1%
 

FranchiseCode
Real number (ℝ≥0)

ZEROS

Distinct count33
Unique (%)1.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1970.1920688007644
Minimum0
Maximum89658
Zeros577
Zeros (%)27.6%
Memory size16.5 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median1
Q31
95-th percentile1
Maximum89658
Range89658
Interquartile range (IQR)1

Descriptive statistics

Standard deviation11263.23066
Coefficient of variation (CV)5.7168186
Kurtosis37.09410803
Mean1970.192069
Median Absolute Deviation (MAD)0
Skewness6.105100645
Sum4123612
Variance126860365
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
1143768.7%
 
057727.6%
 
15710100.5%
 
6890580.4%
 
6806070.3%
 
8840760.3%
 
6914960.3%
 
3748150.2%
 
1800030.1%
 
6956030.1%
 
6888030.1%
 
6721930.1%
 
7989720.1%
 
135720.1%
 
1053320.1%
 
2823620.1%
 
337471< 0.1%
 
499521< 0.1%
 
266851< 0.1%
 
104651< 0.1%
 
412981< 0.1%
 
448901< 0.1%
 
435711< 0.1%
 
787601< 0.1%
 
889051< 0.1%
 
Other values (8)80.4%
 
ValueCountFrequency (%) 
057727.6%
 
1143768.7%
 
135720.1%
 
104651< 0.1%
 
1053320.1%
 
151001< 0.1%
 
15710100.5%
 
1800030.1%
 
181601< 0.1%
 
266851< 0.1%
 
ValueCountFrequency (%) 
896581< 0.1%
 
889051< 0.1%
 
889021< 0.1%
 
8840760.3%
 
870751< 0.1%
 
818001< 0.1%
 
7989720.1%
 
787601< 0.1%
 
6956030.1%
 
6914960.3%
 

UrbanRural
Categorical

Distinct count3
Unique (%)0.1%
Missing0
Missing (%)0.0%
Memory size16.5 KiB
1
1736
0
 
230
2
 
127
ValueCountFrequency (%) 
1173682.9%
 
023011.0%
 
21276.1%
 

Length

Max length1
Median length1
Mean length1
Min length1

Overview of Unicode Properties

Unique unicode characters3
Unique unicode categories (?)1
Unique unicode scripts (?)1
Unique unicode blocks (?)1
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
1173682.9%
 
023011.0%
 
21276.1%
 

Most occurring categories

ValueCountFrequency (%) 
Decimal Number2093100.0%
 

Most frequent Decimal Number characters

ValueCountFrequency (%) 
1173682.9%
 
023011.0%
 
21276.1%
 

Most occurring scripts

ValueCountFrequency (%) 
Common2093100.0%
 

Most frequent Common characters

ValueCountFrequency (%) 
1173682.9%
 
023011.0%
 
21276.1%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII2093100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
1173682.9%
 
023011.0%
 
21276.1%
 

RevLineCr
Categorical

Distinct count4
Unique (%)0.2%
Missing0
Missing (%)0.0%
Memory size16.5 KiB
1
733
0
729
2
578
3
 
53
ValueCountFrequency (%) 
173335.0%
 
072934.8%
 
257827.6%
 
3532.5%
 

Length

Max length1
Median length1
Mean length1
Min length1

Overview of Unicode Properties

Unique unicode characters4
Unique unicode categories (?)1
Unique unicode scripts (?)1
Unique unicode blocks (?)1
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
173335.0%
 
072934.8%
 
257827.6%
 
3532.5%
 

Most occurring categories

ValueCountFrequency (%) 
Decimal Number2093100.0%
 

Most frequent Decimal Number characters

ValueCountFrequency (%) 
173335.0%
 
072934.8%
 
257827.6%
 
3532.5%
 

Most occurring scripts

ValueCountFrequency (%) 
Common2093100.0%
 

Most frequent Common characters

ValueCountFrequency (%) 
173335.0%
 
072934.8%
 
257827.6%
 
3532.5%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII2093100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
173335.0%
 
072934.8%
 
257827.6%
 
3532.5%
 

LowDoc
Real number (ℝ≥0)

Distinct count5
Unique (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.0234113712374582
Minimum0
Maximum4
Zeros1
Zeros (%)< 0.1%
Memory size16.5 KiB

Quantile statistics

Minimum0
5-th percentile1
Q11
median1
Q31
95-th percentile1
Maximum4
Range4
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.1719487379
Coefficient of variation (CV)0.16801527
Kurtosis86.20940044
Mean1.023411371
Median Absolute Deviation (MAD)0
Skewness8.1530416
Sum2142
Variance0.02956636846
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
1204797.8%
 
2412.0%
 
330.1%
 
41< 0.1%
 
01< 0.1%
 
ValueCountFrequency (%) 
01< 0.1%
 
1204797.8%
 
2412.0%
 
330.1%
 
41< 0.1%
 
ValueCountFrequency (%) 
41< 0.1%
 
330.1%
 
2412.0%
 
1204797.8%
 
01< 0.1%
 

DisbursementDate
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count22
Unique (%)1.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2004.0363115145724
Minimum1989
Maximum2010
Zeros0
Zeros (%)0.0%
Memory size16.5 KiB

Quantile statistics

Minimum1989
5-th percentile1995
Q12003
median2005
Q32007
95-th percentile2008
Maximum2010
Range21
Interquartile range (IQR)4

Descriptive statistics

Standard deviation3.96008046
Coefficient of variation (CV)0.001976052249
Kurtosis2.982347253
Mean2004.036312
Median Absolute Deviation (MAD)2
Skewness-1.647102511
Sum4194448
Variance15.68223725
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
200740519.4%
 
200638718.5%
 
200522210.6%
 
200421810.4%
 
20031899.0%
 
20021396.6%
 
2001884.2%
 
2008864.1%
 
2009492.3%
 
1999452.2%
 
2000401.9%
 
2010401.9%
 
1998281.3%
 
1997231.1%
 
1991211.0%
 
1995211.0%
 
1996190.9%
 
1990170.8%
 
1992160.8%
 
1993150.7%
 
1994130.6%
 
1989120.6%
 
ValueCountFrequency (%) 
1989120.6%
 
1990170.8%
 
1991211.0%
 
1992160.8%
 
1993150.7%
 
1994130.6%
 
1995211.0%
 
1996190.9%
 
1997231.1%
 
1998281.3%
 
ValueCountFrequency (%) 
2010401.9%
 
2009492.3%
 
2008864.1%
 
200740519.4%
 
200638718.5%
 
200522210.6%
 
200421810.4%
 
20031899.0%
 
20021396.6%
 
2001884.2%
 

DisbursementGross
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count1180
Unique (%)56.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean243108.55327281414
Minimum4835
Maximum2315000
Zeros0
Zeros (%)0.0%
Memory size16.5 KiB

Quantile statistics

Minimum4835
5-th percentile10000
Q140212
median100000
Q3300000
95-th percentile1000000
Maximum2315000
Range2310165
Interquartile range (IQR)259788

Descriptive statistics

Standard deviation338593.0288
Coefficient of variation (CV)1.392764772
Kurtosis6.248458527
Mean243108.5533
Median Absolute Deviation (MAD)75000
Skewness2.36560185
Sum508826202
Variance1.146452392e+11
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
500001185.6%
 
25000713.4%
 
100000673.2%
 
10000673.2%
 
150000371.8%
 
35000351.7%
 
15000281.3%
 
5000271.3%
 
20000241.1%
 
30000211.0%
 
200000190.9%
 
250000150.7%
 
300000140.7%
 
60000140.7%
 
1000000140.7%
 
75000140.7%
 
500000130.6%
 
45000120.6%
 
65000110.5%
 
80000100.5%
 
40000100.5%
 
55000100.5%
 
10500080.4%
 
13500080.4%
 
7000080.4%
 
Other values (1155)141867.7%
 
ValueCountFrequency (%) 
48351< 0.1%
 
49991< 0.1%
 
5000271.3%
 
52921< 0.1%
 
56001< 0.1%
 
600030.1%
 
68611< 0.1%
 
70001< 0.1%
 
73191< 0.1%
 
750070.3%
 
ValueCountFrequency (%) 
23150001< 0.1%
 
200000060.3%
 
19990001< 0.1%
 
19443001< 0.1%
 
18995001< 0.1%
 
18360001< 0.1%
 
17990001< 0.1%
 
16820001< 0.1%
 
166500020.1%
 
16565001< 0.1%
 

MIS_Status
Boolean

Distinct count2
Unique (%)0.1%
Missing0
Missing (%)0.0%
Memory size16.5 KiB
0
1408
1
685
ValueCountFrequency (%) 
0140867.3%
 
168532.7%
 

ChgOffPrinGr
Real number (ℝ≥0)

ZEROS

Distinct count613
Unique (%)29.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean20110.022933588152
Minimum0
Maximum1509550
Zeros1398
Zeros (%)66.8%
Memory size16.5 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q316332
95-th percentile82781.2
Maximum1509550
Range1509550
Interquartile range (IQR)16332

Descriptive statistics

Standard deviation75584.08943
Coefficient of variation (CV)3.758528256
Kurtosis144.9703881
Mean20110.02293
Median Absolute Deviation (MAD)0
Skewness10.27062743
Sum42090278
Variance5712954575
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0139866.8%
 
50000241.1%
 
35000170.8%
 
1000090.4%
 
10000090.4%
 
1500050.2%
 
2000040.2%
 
3000030.1%
 
3400030.1%
 
500030.1%
 
4000030.1%
 
2500030.1%
 
980020.1%
 
1484020.1%
 
4950020.1%
 
3450020.1%
 
8500020.1%
 
433120.1%
 
4995020.1%
 
4975020.1%
 
7500020.1%
 
4907720.1%
 
4500020.1%
 
369881< 0.1%
 
383981< 0.1%
 
Other values (588)58828.1%
 
ValueCountFrequency (%) 
0139866.8%
 
1611< 0.1%
 
4151< 0.1%
 
13601< 0.1%
 
14941< 0.1%
 
15471< 0.1%
 
15801< 0.1%
 
19631< 0.1%
 
22421< 0.1%
 
24151< 0.1%
 
ValueCountFrequency (%) 
15095501< 0.1%
 
12551751< 0.1%
 
10586721< 0.1%
 
8000541< 0.1%
 
7763181< 0.1%
 
6424041< 0.1%
 
6345491< 0.1%
 
5860261< 0.1%
 
5734291< 0.1%
 
5524781< 0.1%
 

GrAppv
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count659
Unique (%)31.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean233398.54706163402
Minimum4500
Maximum2350000
Zeros0
Zeros (%)0.0%
Memory size16.5 KiB

Quantile statistics

Minimum4500
5-th percentile10000
Q130000
median63000
Q3300000
95-th percentile1000000
Maximum2350000
Range2345500
Interquartile range (IQR)270000

Descriptive statistics

Standard deviation343962.8197
Coefficient of variation (CV)1.473714486
Kurtosis6.06429949
Mean233398.5471
Median Absolute Deviation (MAD)50500
Skewness2.344334224
Sum488503159
Variance1.183104213e+11
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
5000025112.0%
 
100001386.6%
 
250001356.5%
 
100000994.7%
 
35000914.3%
 
30000793.8%
 
20000582.8%
 
15000562.7%
 
150000401.9%
 
5000391.9%
 
40000311.5%
 
45000241.1%
 
60000201.0%
 
200000180.9%
 
250000170.8%
 
75000160.8%
 
1000000160.8%
 
300000150.7%
 
500000150.7%
 
8500090.4%
 
8000090.4%
 
12500080.4%
 
13500080.4%
 
3700070.3%
 
9000070.3%
 
Other values (634)88742.4%
 
ValueCountFrequency (%) 
45001< 0.1%
 
5000391.9%
 
55001< 0.1%
 
600050.2%
 
70001< 0.1%
 
750040.2%
 
800020.1%
 
86001< 0.1%
 
100001386.6%
 
1050020.1%
 
ValueCountFrequency (%) 
23500001< 0.1%
 
200000060.3%
 
19990001< 0.1%
 
19443001< 0.1%
 
18995001< 0.1%
 
18360001< 0.1%
 
17990001< 0.1%
 
16820001< 0.1%
 
166500020.1%
 
16565001< 0.1%
 

SBA_Appv
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count754
Unique (%)36.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean189470.24175824175
Minimum2250
Maximum2115000
Zeros0
Zeros (%)0.0%
Memory size16.5 KiB

Quantile statistics

Minimum2250
5-th percentile5000
Q115000
median42500
Q3240000
95-th percentile850000
Maximum2115000
Range2112750
Interquartile range (IQR)225000

Descriptive statistics

Standard deviation299244.2617
Coefficient of variation (CV)1.579373409
Kurtosis6.570152426
Mean189470.2418
Median Absolute Deviation (MAD)36125
Skewness2.42137024
Sum396561216
Variance8.954712815e+10
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
2500023011.0%
 
50001165.5%
 
125001095.2%
 
17500844.0%
 
15000753.6%
 
50000733.5%
 
10000522.5%
 
7500452.2%
 
20000301.4%
 
22500271.3%
 
2500251.2%
 
8500221.1%
 
127500211.0%
 
21250201.0%
 
75000190.9%
 
42500150.7%
 
225000140.7%
 
4250130.6%
 
12750120.6%
 
40000120.6%
 
375000120.6%
 
30000110.5%
 
150000100.5%
 
187500100.5%
 
90000100.5%
 
Other values (729)102649.0%
 
ValueCountFrequency (%) 
22501< 0.1%
 
2500251.2%
 
27501< 0.1%
 
300040.2%
 
35001< 0.1%
 
400020.1%
 
4250130.6%
 
43001< 0.1%
 
45001< 0.1%
 
50001165.5%
 
ValueCountFrequency (%) 
21150001< 0.1%
 
19990001< 0.1%
 
18360001< 0.1%
 
17990001< 0.1%
 
16820001< 0.1%
 
16550001< 0.1%
 
16200001< 0.1%
 
15830001< 0.1%
 
15120001< 0.1%
 
150000050.2%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

Sample

First rows

ZipICSApprovalFYTermNoEmpNewExistCreateJobRetainedJobFranchiseCodeUrbanRuralRevLineCrLowDocDisbursementDateDisbursementGrossMIS_StatusChgOffPrinGrGrAppvSBA_Appv
09280153242020013611.0001011200132812003000015000
19050553121020015611.0001011200330000003000015000
292103531210200136101.0001011200130000003000015000
39210853131220033661.0001011200350000005000025000
4913455313902006240651.03651101200634300000343000343000
59583153121020038411.0001011200355825005000025000
690255531210200626921.002110120062975001247074297500223125
79080853121020068412.0211211200667047003000015000
89270453139020042251.00011212004500001353335000025000
99458353132020048442.000111120041750000100005000

Last rows

ZipICSApprovalFYTermNoEmpNewExistCreateJobRetainedJobFranchiseCodeUrbanRuralRevLineCrLowDocDisbursementDateDisbursementGrossMIS_StatusChgOffPrinGrGrAppvSBA_Appv
20839202153139020063222.00212112006763531235943000015000
20849064053121020065422.000111120061295019994100005000
20859261853139020068431.003111120061057660010000050000
20869190253121020068431.0031211200692502003000015000
208795112531210200624061.0461101200772100000721000721000
2088913315323102006240281.08281101200610290000010290001029000
20899234653223020066052.005110120061500000015000075000
209092021532120199730041.0001001199799000009900079200
20919301253212019978421.0001001199750000005000040000
209291352532120199712031.0001001199725115000500000375000